12 July 2018

Content

  • Introduction to metabolomics
  • Preprocessing of LC-MS data in Bioconductor

Metabolomics?

  • Large-scale study of small molecules (metabolites) in a system (cell, tissue, organism).
  • Metabolites: intermediates and products of cellular processes.
  • Metabolome?
    • Genome: what can happen.
    • Transcriptome: what appears to be happening.
    • Proteome: what makes it happen.
    • Metabolome: what actually happened.
  • Metabolome influenced by genetic and environmental factors.

Metabolites? Metabolomics?

Metabolites? Metabolomics?

Metabolites? Metabolomics?

Metabolites? Metabolomics?

Metabolites? Metabolomics?

Metabolites? Metabolomics?

Metabolites? Metabolomics?

Metabolites? Metabolomics?

How can we measure metabolites?

  • Nuclear Magnetic Resonance (NMR) - not covered here.
  • Mass spectrometry (MS)-based metabolomics.
  • Targeted metabolomics TODO describe
  • Untargeted metabolomics TODO describe

Mass Spectrometry (MS)

Mass Spectrometry (MS)

  • Problem: unable to distinguish between metabolites with the same mass-to-charge ratio (m/z).
  • Solution: separate metabolites prior to MS by another property.

Liquid Chromatography Mass Spectrometry (LC-MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

LC separation

  • list some possibilities how metabolites could be separated.
  • Example HILIC.

LC-MS data preprocessing

  • Chromatographic peak detection
  • Alignment
  • Correspondence

Chromatographic peak detection

  • Aim: identify chromatographic peaks in the data.

Chromatographic peak detection

  • Aim: identify chromatographic peaks in the data.

Chromatographic peak detection

  • Aim: identify chromatographic peaks in the data.
  • allow different rt-widths
  • allow some scattering on m/z
  • identify correct boundaries

Chromatographic peak detection

  • centWave [Tautenhahn et al. BMC Bioinformatics, 2008]:
  • Step 1: identify regions of interest.

Chromatographic peak detection

  • Step 2: peak detection using continuous wavelet transform.
  • Allows detection of peaks with different widths.

Chromatographic peak detection

  • After reading the data with readMSData (MSnbase package):
  • xcms: findChromPeaks function, passing settings along with an algorithm-specific parameter object.
cwp <- CentWaveParam(peakwidth = c(2, 10), snthresh = 5)
data <- findChromPeaks(data, param = cwp)
head(chromPeaks(data), n = 3)
##            mz    mzmin    mzmax     rt rtmin  rtmax     into     intb
## [1,] 114.0907 114.0899 114.0929  1.954 0.280  3.907 1559.829 1555.923
## [2,] 114.0913 114.0884 114.0929  5.860 4.465  8.650 1890.221 1885.757
## [3,] 114.0914 114.0899 114.0929 10.882 8.650 13.114 1950.953 1946.210
##          maxo  sn sample is_filled
## [1,] 584.9510 584      1         0
## [2,] 601.8881 601      1         0
## [3,] 691.9580 691      1         0

Alignment

  • Chromatography subject to (random and systematic) noise.
  • Same analyte may elute at slightly different time.

  • Shifts are LC-setup dependent, seem to be also analyte dependent.

Alignment

  • How strong the shifts are depends on the LC-setup.
  • Many algorithms available [Smith et al. Brief Bioinformatics 2013]
  • Main assumption: analytes elute in the same order.

Correspondence

  • Aim: group peaks across samples, assuming they represent the same ion.
  • Depends on proper alignment.

Correspondence

  • Aim: group peaks across samples, assuming they represent the same ion.
  • Depends on proper alignment.

Correspondence

  • Aim: group peaks across samples, assuming they represent the same ion.
  • Depends on proper alignment.

Correspondence

TODO add content: - image describing the alignment - available methods - what is the result? (m/z - rt ranges).

Normalization

TODO add content: - what sources of variation? - Crucial to add QC controls in the experiment. - available methods (RUV, linear models). - MS runs not very expensive, running replicates, QC controls etc usually not a problem.

Identification

  • Match compounds based on features’ m/z.
  • Lab-internal databases with approximate retention times for specific compounds.

TODO add content: - What databases available - up and coming: compound db, similar to ensembldb and alike. Problem: unclear license situation.

Afternoon lab

  • LC-MS data handling (MSnbase).
  • LC-MS data preprocessing using xcms.